59 research outputs found

    Resilin: Elastic MapReduce over Multiple Clouds

    Get PDF
    The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google's implementation is proprietary, MapReduce can be leveraged by anyone using the free and open-source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service enabling users to run MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud, allowing users to concentrate on the problem to solve. However, Elastic MapReduce is restricted to Amazon EC2 resources, and is provided at an additional cost. In this paper, we present Resilin, a system implementing the Elastic MapReduce API with resources from clouds other than Amazon EC2, such as private and scientific clouds. Furthermore, we explore a feature going beyond the current Amazon Elastic MapReduce offering: performing MapReduce computations over multiple distributed clouds. The evaluation of Resilin shows the benefits of running computations on more than one cloud. While not being the most efficient way to perform Hadoop computations, it solves the problem of resource availability and adds more flexibility regarding the type/price of resource.Le modĂšle de programmation MapReduce, introduit par Google, offre un moyen simple et efficace de rĂ©aliser des calculs distribuĂ©s sur de grandes quantitĂ©s de donnĂ©es. Bien que la mise en oeuvre de Google soit propriĂ©taire, MapReduce peut ĂȘtre utilisĂ© librement avec l'environnement Hadoop. Pour simplifier l'utilisation de Hadoop dans les nuages informatiques, Amazon Web Services offre Elastic MapReduce, un service web qui permet aux utilisateurs d'exĂ©cuter des applications MapReduce. Il prend en charge l'allocation de ressources, la configuration et l'optimisation de Hadoop, la copie des donnĂ©es, la tolĂ©rance aux fautes, etc. Ce service facilite l'exĂ©cution d'applications MapReduce dans les nuages informatiques, permettant ainsi aux utilisateurs de se concentrer sur la rĂ©solution de leur problĂšme plutĂŽt que sur la gestion de la plate-forme d'exĂ©cution. Elastic MapReduce est limitĂ© ĂĄ l'utilisation de ressources fournies par Amazon EC2 et est proposĂ© Ă  un coĂ»t additionnel. Dans cet article, nous prĂ©sentons Resilin, un systĂšme mettant en oeuvre l'API Elastic MapReduce avec des ressources provenant d'autres nuages informatiques que Amazon EC2, tels que les nuages privĂ©s ou communautaires. De plus, nous explorons une fonctionnalitĂ© nouvelle par rapport au service offert par Amazon Elastic MapReduce: l'exĂ©cution d'applications MapReduce sur plusieurs nuages gĂ©ographiquement distribuĂ©s. L'Ă©valuation de Resilin montre les avantages liĂ©s Ă  l'utilisation de plus d'un nuage pour l'exĂ©cution d'applications MapReduce. Bien qu'il ne fournisse pas la solution la plus efficace pour l'exĂ©cution d'applications MapReduce, Resilin rĂ©sout le problĂšme de la disponibilitĂ© des ressources et ajoute une plus grande flexibilitĂ© en ce qui concerne le type et le prix des ressources

    Resilin: Elastic MapReduce for Private and Community Clouds

    Get PDF
    The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google's implementation is proprietary, MapReduce can be leveraged by anyone using the free and open source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service enabling users to run MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud, allowing users to concentrate on the problem to solve. However, Elastic MapReduce is restricted to Amazon EC2 resources, and is provided at an additional cost. In this paper, we present Resilin, a system implementing the Elastic MapReduce API with resources from other clouds than Amazon EC2, such as private and community clouds. Furthermore, we explore a feature going beyond the current Amazon Elastic MapReduce offering: performing MapReduce computations over multiple distributed clouds.Le modĂšle de programmation MapReduce, introduit par Google, offre un moyen simple et efficace de rĂ©aliser des calculs distribuĂ©s sur de large quantitĂ©s de donnĂ©es. Bien que la mise en Ɠuvre de Google soit propriĂ©taire, MapReduce peut ĂȘtre utilisĂ© librement en utilisant le framework Hadoop. Pour simplifier l'utilisation de Hadoop dans les nuages informatiques, Amazon Web Services offre Elastic MapReduce, un service web qui permet aux utilisateurs d'exĂ©cuter des travaux MapReduce. Il prend en charge l'allocation de ressources, la configuration et l'optimisation de Hadoop, la copie des donnĂ©es, la tolĂ©rance aux fautes, etc. Ce service rend plus accessible l'exĂ©cution de calculs MapReduce dans les nuages informatiques, permettant aux utilisateurs de se concentrer sur la rĂ©solution de leur problĂšme plutĂŽt que sur la gestion de leur plate-forme. Cependant, Elastic MapReduce est limitĂ© Ă  l'utilisation de ressources de Amazon EC2, et est proposĂ© Ă  un coĂ»t additionnel. Dans cet article, nous prĂ©sentons Resilin, un systĂšme mettant en Ɠuvre l'API Elastic MapReduce avec des ressources provenant d'autres nuages informatiques que Amazon EC2, tels que les nuages privĂ©s ou communautaires. De plus, nous explorons une fonctionnalitĂ© additionnelle comparĂ© Ă  Amazon Elastic MapReduce: l'exĂ©cution de calculs MapReduce sur plusieurs nuages distribuĂ©s

    Hydroinformatics On The Cloud: Data Integration, Modeling And Information Communication For Flood Risk Management

    Full text link
    The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood warnings and forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The key elements of the IFIS are: (1) flood inundation maps, (2) autonomous “bridge sensors” that monitor water level in streams and rivers in real time, and (3) real-time flood forecasting models capable of providing flood warning to over 1000 communities in Iowa. The IFIS represents a hybrid of file and compute servers, including a High Performance Computing cluster, codes in different languages, data streams and web services, databases, scripts and visualizations. The IFIS processes raw data (50GB/day) from NEXRAD radars, creates rainfall maps (3GB/day) every 5 minutes, and integrates real-time data from over 600 sensors in Iowa. Even though the IFIS serves over 75,000 users in Iowa using local infrastructure, cloud computing can improve scalability, speed, cost efficiency, accessibility, security, resiliency and uptime. In this collaborative study between the Iowa Flood Center and the Nimbus team at the Argonne National Laboratory, we have analyzed feasibility and price/performance measures of moving the MPI-based computations to the cloud as well as assessment of response times from our interactive web-based system. Moving the system to the cloud, and making it independent and portable, would enable us to share our model easily with the flood research community. This presentation provides an overview of the tools and interfaces in the IFIS, and transition of the IFIS from a local infrastructure to cloud computing environment

    Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

    Get PDF
    Abstract-Infrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types of storage available in the cloud is particularly important for applications that depend on managing large amounts of data within and across clouds. An increasing number of such applications conform to a pattern in which data processing relies on streaming the data to a compute platform where a set of similar operations is repeatedly applied to independent chunks of data. This pattern is evident in virtual observatories such as the Ocean Observatory Initiative, in cases when new data is evaluated against existing features in geospatial computations or when experimental data is processed as a series of time events. In this paper, we propose two strategies for efficiently implementing such streaming in the cloud and evaluate them in the context of an ATLAS application processing experimental data. Our results show that choosing the right cloud configuration can improve overall application performance by as much as three times

    Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

    Get PDF
    International audienceInfrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types of storage available in the cloud is particularly important for applications that depend on managing large amounts of data within and across clouds. An increasing number of such applications conformto a pattern in which data processing relies on streaming the data to a compute platform where a set of similar operations is repeatedly applied to independent chunks of data. This pattern is evident in virtual observatories such as the Ocean Observatory Initiative, in cases when new data is evaluated against existing features in geospatial computations or when experimental data is processed as a series of time events. In this paper, we propose two strategies for efficiently implementing such streaming in the cloud and evaluate them in the contextof an ATLAS application processing experimental data. Our results show that choosing the right cloud configuration can improve overall application performance by as much as three times

    Plates-formes d'exécution dynamiques sur des fédérations de nuages informatiques

    No full text
    The increasing needs for computing power have led to parallel and distributed computing, which harness the power of large computing infrastructures in a concurrent manner. Recently, virtualization technologies have increased in popularity, thanks to hypervisors improvements, the shift to multi-core architectures, and the spread of Internet services. This has led to the emergence of cloud computing, a paradigm offering computing resources in an elastic, on-demand approach while charging only for consumed resources. In this context, this thesis proposes four contributions to leverage the power of multiple clouds. They follow two directions: the creation of elastic execution platforms on top of federated clouds, and inter-cloud live migration for using them in a dynamic manner. We propose mechanisms to efficiently build elastic execution platforms on top of multiple clouds using the sky computing federation approach. Resilin is a system for creating and managing MapReduce execution platforms on top of federated clouds, allowing to easily execute MapReduce computations without interacting with low level cloud interfaces. We propose mechanisms to reconfigure virtual network infrastructures in the presence of inter-cloud live migration, implemented in the ViNe virtual network from University of Florida. Finally, Shrinker is a live migration protocol improving the migration of virtual clusters over wide area networks by eliminating duplicated data between virtual machines.Les besoins croissants en ressources de calcul ont menĂ© au parallĂ©lisme et au calcul distribuĂ©, qui exploitent des infrastructures de calcul large Ă©chelle de maniĂšre concurrente. RĂ©cemment, les technologies de virtualisation sont devenues plus populaires, grĂące Ă  l'amĂ©lioration des hyperviseurs, le passage vers des architectures multi-cƓur, et la diffusion des services Internet. Cela a donnĂ© lieu Ă  l'Ă©mergence de l'informatique en nuage, un paradigme qui offre des ressources de calcul de façon Ă©lastique et Ă  la demande, en facturant uniquement les ressources consommĂ©es. Dans ce contexte, cette thĂšse propose quatre contributions pour tirer parti des capacitĂ©s de multiples nuages informatiques. Elles suivent deux directions : la crĂ©ation de plates-formes d'exĂ©cution Ă©lastiques au-dessus de nuages fĂ©dĂ©rĂ©s, et la migration Ă  chaud entre nuages pour les utiliser de façon dynamique. Nous proposons des mĂ©canismes pour construire de façon efficace des plates-formes d'exĂ©cution Ă©lastiques au-dessus de plusieurs nuages utilisant l'approche de fĂ©dĂ©ration sky computing. Resilin est un systĂšme pour crĂ©er et gĂ©rer des plates-formes d'exĂ©cution MapReduce au-dessus de nuages fĂ©dĂ©rĂ©s, permettant de facilement exĂ©cuter des calculs MapReduce sans interagir avec les interfaces bas niveau des nuages. Nous proposons des mĂ©canismes pour reconfigurer des infrastructures rĂ©seau virtuelles lors de migrations Ă  chaud entre nuages, mis en Ɠuvre dans le rĂ©seau virtuel ViNe de l'UniversitĂ© de Floride. Enfin, Shrinker est un protocole de migration Ă  chaud amĂ©liorant la migration de grappes de calcul virtuelles dans les rĂ©seaux Ă©tendus en Ă©liminant les donnĂ©es dupliquĂ©es entre machines virtuelles

    Building Dynamic Computing Infrastructures over Distributed Clouds

    Get PDF
    International audienceThe emergence of cloud computing infrastructures brings new ways to build and manage computing systems, with the flexibility offered by virtualization technologies. In this context, this PhD thesis focuses on two principal objectives. First, leveraging virtualization and cloud computing infrastructures to build distributed large scale computing platforms from multiple cloud providers, allowing to run software requiring large amounts of computation power. Second, developing mechanisms to make these infrastructures more dynamic. These mechanisms, providing inter-cloud live migration, offer new ways to exploit the inherent dynamic nature of distributed clouds

    Plates-formes d'exécution dynamiques sur des fédérations de nuages informatiques

    No full text
    The increasing needs for computing power have led to parallel and distributed computing, which harness the power of large computing infrastructures in a concurrent manner. Recently, virtualization technologies have increased in popularity, thanks to hypervisors improvements, the shift to multi-core architectures, and the spread of Internet services. This has led to the emergence of cloud computing, a paradigm offering computing resources in an elastic, on-demand approach while charging only for consumed resources. In this context, this thesis proposes four contributions to leverage the power of multiple clouds. They follow two directions: the creation of elastic execution platforms on top of federated clouds, and inter-cloud live migration for using them in a dynamic manner. We propose mechanisms to efficiently build elastic execution platforms on top of multiple clouds using the sky computing federation approach. Resilin is a system for creating and managing MapReduce execution platforms on top of federated clouds, allowing to easily execute MapReduce computations without interacting with low level cloud interfaces. We propose mechanisms to reconfigure virtual network infrastructures in the presence of inter-cloud live migration, implemented in the ViNe virtual network from University of Florida. Finally, Shrinker is a live migration protocol improving the migration of virtual clusters over wide area networks by eliminating duplicated data between virtual machines.Les besoins croissants en ressources de calcul ont menĂ© au parallĂ©lisme et au calcul distribuĂ©, qui exploitent des infrastructures de calcul large Ă©chelle de maniĂšre concurrente. RĂ©cemment, les technologies de virtualisation sont devenues plus populaires, grĂące Ă  l'amĂ©lioration des hyperviseurs, le passage vers des architectures multi-cƓur, et la diffusion des services Internet. Cela a donnĂ© lieu Ă  l'Ă©mergence de l'informatique en nuage, un paradigme qui offre des ressources de calcul de façon Ă©lastique et Ă  la demande, en facturant uniquement les ressources consommĂ©es. Dans ce contexte, cette thĂšse propose quatre contributions pour tirer parti des capacitĂ©s de multiples nuages informatiques. Elles suivent deux directions : la crĂ©ation de plates-formes d'exĂ©cution Ă©lastiques au-dessus de nuages fĂ©dĂ©rĂ©s, et la migration Ă  chaud entre nuages pour les utiliser de façon dynamique. Nous proposons des mĂ©canismes pour construire de façon efficace des plates-formes d'exĂ©cution Ă©lastiques au-dessus de plusieurs nuages utilisant l'approche de fĂ©dĂ©ration sky computing. Resilin est un systĂšme pour crĂ©er et gĂ©rer des plates-formes d'exĂ©cution MapReduce au-dessus de nuages fĂ©dĂ©rĂ©s, permettant de facilement exĂ©cuter des calculs MapReduce sans interagir avec les interfaces bas niveau des nuages. Nous proposons des mĂ©canismes pour reconfigurer des infrastructures rĂ©seau virtuelles lors de migrations Ă  chaud entre nuages, mis en Ɠuvre dans le rĂ©seau virtuel ViNe de l'UniversitĂ© de Floride. Enfin, Shrinker est un protocole de migration Ă  chaud amĂ©liorant la migration de grappes de calcul virtuelles dans les rĂ©seaux Ă©tendus en Ă©liminant les donnĂ©es dupliquĂ©es entre machines virtuelles.RENNES1-BU Sciences Philo (352382102) / SudocSudocFranceF
    • 

    corecore